Database reference guide

HOME

Loading Data limits and constraints

This section provides a guide to the performance characteristics of Engine when loading and indexing data.

Row and Column Limits

Column Limit

There is a hard limit of 32,768 columns per table that can be imported into Engine via Loader. This is the coded limit in Engine and cannot be increased further without significant architectural changes.

Generally, when loading a large database, the more memory that is available to the system, the faster the load will progress.

To determine whether there is any benefit in adding more memory to the system, the following approach should be used.

Calculate the storage requirements for the largest field in the system (this will be a wide, high-cardinality text field if it exists). If this is less than the available memory on the system, adding memory will not improve the performance of the load process.

The load process is broken down into two sections:

  • Loading
  • Indexing

Load Speed

The speed at which the data is loaded depends on the following;

  • The cardinality of the data. The higher the cardinality, the longer the time to load.
  • The location of the source data - data will be loaded faster if the source data is on a different disk than the Engine repository.
  • The number of columns - the more columns, the longer the load time. It is more efficient to load rows than columns.
  • The speed of I/O on the repository disk.

Index Speed

The time to index the data depends on the memory available for indexing and the size of the field to be indexed. The fastest possible index times are achieved when the index can be fully created in RAM with one pass. If this is not possible then the index will be created in blocks.

If it is not possible to create the index in one pass, the data will be repeatedly broken down into equal chunks until the amount of memory required to create the index drops below the memory available on the system.

Size of loaded data

The size of loaded data can be up to twice the size of the raw data and depends on the characteristics of the data and how much of the data is indexed. The size of the loaded data can be estimated using the tables in the Data Types section.

Disk Space

Before loading data it must be ensured that the available free disk space is twice the size of the raw data. If Engine runs out of disk space, the load will fail.

Compressing Data

Turning on Windows disk compression will adversely affect performance, although it can provide significant savings in disk space.

Integer Field

The limits of an integer field are approximately 231

  Online & Instructor-Led Courses | Training Videos | Webinar Recordings
© Alterian. All Rights Reserved. | Privacy Policy | Legal Notice